Categorical Ambiguity and Information Content: A Corpus-based Study of Chinese
نویسندگان
چکیده
The degree of ambiguity in Chinese is investigated in this paper based on the tagged Sinica Corpus. We propose to use measurement of information content instead of frequency to model generalizations about categorical ambiguity Two important generalizations were found: First, the degree of ambiguity indeed correlates with the number of possible categories that a word has. Second,. Even though frequently used words are more likely to be categorical ambiguous (Huang et al. 2002), the degree of ambiguity of a word does not depend on its frequency.
منابع مشابه
From Quantitative to Qualitative Studies: Developments in Chinese Computational and Corpus Linguistics
Adopting corpus-based empirical approaches to linguistics, this paper has two main goals: the first is to propose formal methodology to extract meaningful quantitative characterizations from Chinese corpora, the second is to achieve generalizations about theoretically significant linguistic qualities based on these quantitative data. The quantitative scales discussed include mutual information,...
متن کاملSemantic ambiguity effects on traditional Chinese character naming: A corpus-based approach.
Words are considered semantically ambiguous if they have more than one meaning and can be used in multiple contexts. A number of recent studies have provided objective ambiguity measures by using a corpus-based approach and have demonstrated ambiguity advantages in both naming and lexical decision tasks. Although the predictive power of objective ambiguity measures has been examined in several ...
متن کاملThe Use of Second-Person Reference in Advertisement Translation with Reference to Translation between Chinese and English
This research aimed to review the use of second-person reference in advertisement translation, work out the general rules, and provide guidance to translators. Using second-person reference is common in the advertising discourse. Addressing audiences directly involves their attention and in this way enhances their memorization of the advertised message. Second-person reference can be realized v...
متن کاملA Corpus-Based Study of the Lexical Make-up of Applied Linguistics Article Abstracts
This paper reports results from a corpus-based study that explored the frequency of words in the abstracts of applied linguistics journal articles. The abstracts of major articles in leading applied linguists journals, published since 2005 up to November 2001 were analyzed using software modules from the Compleat Lexical Tutor. The output includes a list of the most frequent content words, list...
متن کاملUyghur-Chinese Translation Disambiguation Method Research Based on Knowledge Automatic-Acquisition
This thesis studies the disambiguation method in Uyghur-Chinese translation, and proposes the design philosophy of automatic-acquisition in translation label library aiming at the deficiency of disambiguation corpus in Uyghur. It refers to the existing Uyghur-Chinese bilingual dictionary, Chinese corpus and the Internet, and acquires the corresponding Chinese translation label examples to Uyghu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of Chinese Language and Computing
دوره 14 شماره
صفحات -
تاریخ انتشار 2002